Enrichment analyses of meta-analyses: evidence mapping, bibliometrics, and alternative impact metrics

Visualizing study characteristics, hidden risk of bias, societal influence, and research translation

Contributors

Yefeng Yang, Malgorzata Lagisz, Shinichi Nakagawa

Update

Last update Nov. 2023

Preface

This tutorial is a supplement to our paper submitted to Research Synthesis Methods.

We present a framework consisting of three approaches that can enhance meta-analyses: 1) scoping reviews (evidence map), 2) bibliometrics, and 3) alternative impact metrics. These three ‘enrichment’ approaches facilitate the research synthesis of both quantitative and qualitative evidence, along with academic and non-academic influences. While the meta-analysis yields quantitative insights (e.g., overall estimates), the enrichment analyses provide user-friendly summaries of qualitative information on evidence base. Scoping reviews can visualize study characteristics, unravelling knowledge gaps and methodological differences. Bibliometric analysis offers a visual assessment of the non-independent evidence, such as hyper-dominant authors and countries, and funding sources, potentially informing the risk of bias. Impact metric analysis employs alternative metrics to gauge societal influence and research translation (e.g., policy and patent citations) of studies in the meta-analysis. To illustrate the application of this framework, we provide sample visualizations and R code.

Note that these applications are provided for illustrative purposes, and any scientific, clinical, or policy implications should not be drawn from them.

Citation

If our paper and tutorial have helped you, please cite the following paper:

Yefeng Yang, Malgorzata Lagisz, Shinichi Nakagawa. Enriching meta-analyses through scoping review, bibliometrics, and alternative impact metrics: Visualizing study characteristics, hidden risk of bias, societal influence, and research translation. arXiv, 2023.

Contact

If you have any questions, mistakes, or bug to report, please contact corresponding authors:

  • Dr. Yefeng Yang

Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia

Email:

  • Professor Shinichi Nakagawa, PhD, FRSN

Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia

Email:

Set-up

Setting global options will apply those options to all of the following chunks of code in the tutorial.

Our illustrations use R statistical software and existing R packages, which you will first need to download and install.

If you do not have it on your machine, first install R (download). We recommend also downloading RStudio, a popular integrated development environment for coding with R, created by a company named posit (download).

After installing R, you must install several packages that contain necessary functions for performing the analyses in this tutorial. If the packages are archived in CRAN, use install.packages() to install them. To install packages that are not on CRAN and archived in Github repositories, execute devtools::install_github().

The package list is as follows:

tidyverse, here, DT, ggpubr, readxl, metafor, lme4, car, ggplot2, cowplot, visdat, naniar, viridis, ggthemr, pander, formatR, rotl, ape, ggstance, ggtree, flextable, bibliometrix, circlize, igraph.

Custom function

We provide helper functions necessary for our illustrations. To use them, you need to source them. You also can paste the source code into the console, and hit “Enter” to let R “learn” these custom functions.

# custom function
source(here("Function", "custom.R"))

Enrichment 1: Evidence mapping

The first enrichment analysis is evidence mapping (sometimes termed scoping review, evidence review). Incorporating evidence mapping into a meta-analysis offers two benefits. Firstly, it inherits the merits of a conventional evidence map, such as identifying knowledge gaps, directing research priorities, and providing support for funding and policy decisions . Secondly, evidence mapping can inform methodological decisions (e.g., post hoc analyses) and interpretations of meta-analytic findings.

Grid-like plots

We recommend using grid-like graphs, where one dimension (e.g., intervention) is placed on the x-axis, and the other (e.g., outcomes) is placed on the y-axis, with statistics (e.g., the number of studies, number of effects, sample size) displayed at the intersection of the x- and y-axes . Additional dimensions, such as study design, effect size, study quality or RoB (Risk of Bias) measures, can be also incorporated (see more on this point in the next section). This figure was R code was adapted from (1).

Data 1

Hodkinson and colleagues (2) conducted a network meta-analysis to assess the efficacy of different self-management interventions (multidisciplinary case management, regularly supported self-management, and minimally supported self-management) in enhancing the quality of life among asthma patients.

# load data
dat <- read_xlsx(here("Data", "Hodkinson_2020.xlsx"))

# preprocess
dat$vi <- dat$`Std Err2`^2
dat <- dat[, c("Intervention model", "Outcome", "Hedgesg", "vi", "Study name")]
names(dat) <- c("dimension1", "dimension2", "yi", "vi", "study_id")

# show data in a datatable
t1 <- dat %>%
    dfround(3) %>%
    DT::datatable()
t1

Visualization

Figure I caption:

Examples of evidence maps visualizing study characteristics. (A) A typical grid-like graph with intervention variable as the first dimension, outcome variable as the second dimension and the bubble size representing the number of studies. (B) The bubble sizes are changed to represent the number of effect sizes. (C) The colour scale is applied to the bubbles to denote the magnitude of the mean effect size. (D) The population variable is mapped to the shape serving as the third information dimension.

# get estimate for each cell
est_dat <- dat %>% group_by(dimension1, dimension2) %>%
  group_modify(~ meta_aggregate(.x, rho = 0.5)) %>% ungroup()

# traditional map with the number of study
est_dat$dimension1 <- as.factor(est_dat$dimension1)

Box1_map1 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = n_studies) ) +
  geom_point(alpha = 0.5, color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[1]) + 
  labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)") +
  scale_size(range=c(5,10)) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + # 
  scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
  theme_bw() +
  guides(size = "none") +
  theme(legend.position='top', 
        legend.justification='right',
        legend.direction='horizontal', 
        axis.text = element_text(color = "black"),
        axis.title = element_text(color = "black")) +
  geom_text(aes(label = as.character(n_studies)), size = 4, color = "gray10") +
  labs(caption = "The value in the cell is the number of studies") + 
   theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
         axis.text = element_text(size = 12),
         axis.title = element_text(size = 12, face = "bold")) 


# traditional map with the number of effect size
Box1_map2 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = n_es)) +
  geom_point(alpha = 0.5, color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3]) + 
  labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)") +
  scale_size(range=c(5,10)) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
  theme_bw() +
  guides(size = "none") +
  theme(legend.position='top', 
        legend.justification='right',
        legend.direction='horizontal',  
        axis.text = element_text(color = "black"),
        axis.title = element_text(color = "black")) +
  geom_text(aes(label = as.character(n_es)), size = 4, color = "gray10") +
  labs(caption = "The value in the cell is the number of effect sizes") +
   theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
         axis.text = element_text(size = 12),
         axis.title = element_text(face = "bold", size = 12))



# with effect size information
Box1_map3 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = n_es, color = estimate)) +
  geom_point(alpha = 0.6) + 
  scale_color_gradient(
    low = "blue",
    high = "red", 
    limits = c(-1,1),
    guide = "colourbar") + 
  labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)", color = "Meta-analytic mean effect size") +
  scale_size(range=c(5,10)) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
  theme_bw() +
  guides(size = "none") +
  theme(legend.position='top', 
        legend.justification='right',
        legend.direction='horizontal', 
        axis.text = element_text(color = "black"),
        axis.title = element_text(color = "black")) +
  geom_text(aes(label = as.character(n_es)), size = 4, color = "gray10") +
  labs(caption = "The value in the cell is the number of effect sizes") + 
   theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
         axis.text = element_text(size = 12),
         axis.title = element_text(face = "bold", size = 12))


# 3-dimensional graph - we do not recommend to 3 dimensional graph because this is not that friendly to end-user
dat <- read_xlsx(here("Data","Hodkinson_2020.xlsx"))
dat$vi <- dat$`Std Err2`^2
dat <- dat[, c("Intervention model", "Outcome", "Age group", "Hedgesg", "vi", "Study name")]
names(dat) <- c("dimension1", "dimension2", "dimension3", "yi", "vi", "study_id")

# get estimate each cell 
est_dat <- dat %>% group_by(dimension1, dimension2, dimension3) %>%
  group_modify(~ meta_aggregate(.x, rho = 0.5)) %>% ungroup()

# with the third dimension
Box1_map4 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, 
                      shape = dimension3,
                      size = n_es  
                      )) +
  geom_point(alpha = 0.6,  color = RColorBrewer::brewer.pal(n = 8, name = "Dark2")[5], aes(group = dimension3), position = position_dodge(width= 0.5)) +
 scale_color_gradient(
    low = "blue",
    high = "red", 
    limits = c(-1,1),
    guide = "colourbar") + 
  scale_shape_manual(values = c(15, 19, 17, 18, 10, 4, 3, 7, 8, 13)) +
  labs(x = "Dimension 1 (Intervention)", y = "Dimension 2 (Outcome)", shape = "Diamension 3 (Population)") +
  scale_size(range=c(5,10)) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
  theme_bw() +
  guides(size = "none") +
  theme(legend.position='top', 
        legend.justification='right',
        legend.direction='horizontal', 
        axis.text = element_text(color = "black"),
        axis.title = element_text(color = "black")) +
  geom_text(aes(label = as.character(n_es), group = dimension3), size = 4, color = "gray10", position = position_dodge(width = .5)) +
  labs(caption = "The value in the cell is the number of effect sizes") + 
   theme(plot.caption = element_text(size = 10, color = "gray10", face = "italic"),
         axis.text = element_text(size = 12),
         axis.title = element_text(face = "bold", size = 12))


#Box1_map1 + Box1_map2 + Box1_map3 + Box1_map4 + patchwork::plot_layout(ncol = 2, nrow = 2) + patchwork::plot_annotation(tag_levels = "A")
#png(filename = "Box1_map.png", width = 10, height = 10, units = "in", type = "windows", res = 400)
plot_grid(Box1_map1, Box1_map2, Box1_map3, Box1_map4, labels = c('A','B','C','D'), label_size = 14, nrow = 2, ncol = 2)

Figure I shows the landscape of evidence about the effectiveness of the self-management interventions. For example, panels A – B visually represent where randomized controlled trials (RCTs) have been conducted to examine the interventions’ effectiveness, offering insights into which interventions have received more clinical attention. Panel C reveals a demographic bias, as most self-management interventions were trialled on adults. Panel D conveys critical information about both which interventions have been examined in the RCTs and their associated effectiveness. This information can inform funding strategies (e.g., funding missing RCTs, or in-depth investigation) and help clinicians to gauge the volume and effectiveness of their interventions of interest.

Sankey diagrams

We recommend using Sankey or alluvial diagrams to visualize the flow or overlaps in the composition of context-dependence drivers, summarizing their connections and co-linearity and missing data patterns in an accessible manner. Comprehensive and clear visual representation of moderator variables can facilitate customizable evidence synthesis identifying context-specific drivers and delivering more tailored evidence for science, policy, and practice, improving generalizability and transferability of evidence. This figure was R code was adapted from https://github.com/davidsjoberg/ggsankey/blob/main/R/sankey.R.

Data 2

Mertens and colleagues (3) employed a multilevel meta-analytic model to synthesize evidence on the effectiveness of choice architecture interventions (often referred to as nudges) for behaviour change across various techniques, behavioural domains, and other study characteristics (e.g., populations and locations).

# load data
dat <- read.csv(here("Data", "Mertens_2021.csv"))
names(dat)[1] <- "publication_id"
# show data in a data
t2 <- dat %>%
    dfround(3) %>%
    DT::datatable()
t2

Visualization

Figure II caption:

An example of a Sankey diagram showing the flow and change in the composition of moderator variables considered as context-dependence drivers.

# pre-process
dat <- dat %>%
    dplyr::select(type_experiment, intervention_technique, intervention_category,
        domain, population, location)

# png(filename = './Box1_sankey.png', width = 6, height = 5, units = 'in', type
# = 'windows', res = 400)
ggplot(dlong(dat, type_experiment, location, population, intervention_category, domain,
    intervention_technique), aes(x = x, next_x = next_x, node = node, next_node = next_node,
    fill = factor(node), label = node)) + sankey_p(flow.alpha = 0.8, node.color = "transparent") +
    sankey_p_label(size = 3, color = "white", fill = "gray10", alpha = 0.6) + ggsci::scale_color_tron() +
    theme_sankey(base_size = 10) + labs(x = NULL) + theme(legend.position = "none",
    plot.title = element_text(hjust = 0.5), axis.text.x = element_text(color = "black",
        size = 9)) + scale_x_discrete(labels = c("Moderator 1 \n (Experimental \n approach)",
    "Moderator 2 \n (Geographical \n location)", "Moderator 3 \n (Population \n characteristic)",
    "Moderator 4 \n (Architecture \n category)", "Moderator 5 \n (Behavioral \n domain)",
    "Moderator 6 \n (Intervention \n technique)"), position = "top")

Figure II highlights the diversity of experimental designs in the primary studies included, suggesting potential heterogeneity in the meta-analytic evidence. The tested moderator variables display minimal collinearity, indicating that each variable represents a unique contextual influence. Importantly, Figure II provides useful visual clues to identify the contexts requested by decision-makers, facilitating the assessment of the effectiveness of interventions in the context of interest (e.g., target population and location). A follow-up customizable evidence synthesis can be conducted to improve the generalizability and transferability of meta-analytic evidence.

Phylogenetic trees

For phylogenetic meta-analytic analyses, visual representation of species information, such as species phylogenetic trees, can intuitively convey the breadth of taxa and underlying phylogenetic heterogeneity.

Data 3

Sanders and colleagues (4) used a Bayesian meta-analytic model to synthesize evidence regarding the impacts of artificial light at night on physiological, phenological, life history, activity patterns, and population/community-based outcomes. This meta-analysis included more than 180 species. For illustration, we used the subset that focused on physiological outcomes.

# load data
dat <- read_xlsx(here("Data", "Sanders_2021.xlsx"))
# show data in a data
t3 <- dat %>%
    dfround(3) %>%
    DT::datatable()
t3

Visualization

Figure II caption:

An example of a phylogenetic tree visualizing the breadth of taxa and underlying phylogenetic heterogeneity. The effect size estimate for each species was aggregated from multiple estimates within the same species, assuming a constant within-species correlation (in this case, 0.5). Different colours represent different phylogenetic classes.

# some data wrangling
dat$Species <- gsub("\\.", "_", dat$Species)
dat <- dat[!is.na(dat$Species), ]  # 

# check species we have length(unique(dat$Species)) #184 unique species names,
# if no misspelling

# find Open Tree Taxonomy (OTT) IDs for each species
taxa <- tnrs_match_names(names = unique(dat$Species))

# for illustrative purpose, we just delete the species that are not matched
# find location
pos <- which(dat$Species %in% c("Phodopus_sungeroru", "Electra?__sp", "Molgula_sp",
    "Baetis_spp", "Agrostis_tenuis", "Anthoxanthum_odoratum", "Myotis_daubentonii",
    "_Myotis_mystacinus", "_Myotis_brandtii", "_Myotis_nattereri", "_Plecotus_auritus",
    "Pipistrellus_pipistrelles", "_Pipistrellus_nathusii", "Nyctalus_nyctalus", "_Nyctalus_leisleri",
    "_Eptesicus_serotinus_", "All_families", "Pipistrellus_hesperidus/Hypsugo_anchietaia",
    "Myotis?_sp", "MCF7", "7288cct", "A__cristatellus_", "A__evermanni", "A__gundlachi",
    "A__sagrei", "bat_community_", "N__leisleri", "Poecile_sp", "Coleoptera", "_Diptera",
    "_Lepidoptera", "_Erebidae", "_Chironomidae", "_Noctuidae_and_Psychodidae", "Nyctalus_and_Eptesicus_spp",
    "Anoectochilus_roxburghii", "Myotis_pilosatibialis", "Silene_latifolia"))
dat <- dat[-pos, ]

# match again
taxa <- tnrs_match_names(names = unique(dat$Species))  # still some are not matched. But let's resolve it latter

# check whether occur in the synthetic tree
ott_in_tree <- ott_id(taxa)[is_in_tree(ott_id(taxa))]
# length(ott_id(taxa)) - length(is.na(ott_in_tree)) # still 10 did not appear
# in synthetic tree

# check which 10
out_tree <- filter(taxa, !ott_id %in% ott_in_tree)
# out_tree$search_string

dat <- dat[!dat$Species %in% c("Leptocythere_pellucida", "Myotis_daubentonii,_Myotis_mystacinus,_Myotis_brandtii,_Myotis_nattereri,_Plecotus_auritus",
    "Pipistrellus_pipistrelles,_Pipistrellus_nathusii", "Nyctalus_nyctalus,_Nyctalus_leisleri,_Eptesicus_serotinus_",
    "Myotis_lucifugus", "Acyrthosiphont", "Carabidae", "Staphylinidae", "Eptesicus_bottae",
    "Corethrella_spp", "P__pipistrellus", "Myotis", "Coleoptera,_Diptera,_Lepidoptera,_Erebidae,_Chironomidae,_Noctuidae_and_Psychodidae",
    "Pipistrellus"), ]

# match again
taxa <- tnrs_match_names(names = unique(dat$Species))
# check again whether all otts occur in the synthetic tree
ott_in_tree <- ott_id(taxa)[is_in_tree(ott_id(taxa))]
# length(ott_id(taxa)) - length(is.na(ott_in_tree)) # all good

# now every ott occur in the synthesistic tree. But for the sake of brevity, we
# only visualize a subset of the tree only use studies measuring physiology
Physiology <- subset(dat, Category == "Physiology")
# remove NA
Physiology <- subset(Physiology, Species != "NA")
taxa <- tnrs_match_names(names = unique(Physiology$Species))

# make phylo tree
tree <- suppressWarnings(tol_induced_subtree(ott_ids = ott_id(taxa)))
# the tip labels contain OTTs, which means they will not perfectly match the
# species names in our dataset or the taxon map that we created earlier, remove
# the extra information from the tip labels later; with the IDs removed, we can
# use our taxon map to replace the tip labels in the tree with the species
# names from dataset
tree$tip.label <- strip_ott_ids(tree$tip.label, remove_underscores = TRUE)

# decapitalise species names to match with the search string names in taxa
Physiology <- Physiology %>%
    mutate(search_string = tolower(Species))

# align data
Physiology <- left_join(Physiology, dplyr::select(taxa, search_string, unique_name,
    ott_id), by = "search_string")

# create the variables of spp and phylo
Physiology <- Physiology %>%
    mutate(spp = unique_name, phylo = unique_name)

# prepare annotation data species-specific estimates as fixed effect
agg.es <- escalc(measure = "SMD", m1i = Experimental_Mean, m2i = Control_Mean, sd1i = Experimental_SD,
    sd2i = Control_SD, n1i = Experimental_N, n2i = Control_N, data = Physiology) %>%
    aggregate(cluster = phylo, struct = "CS", rho = 0.5, addk = TRUE)
agg.es <- agg.es[c("phylo", "yi", "vi")]
names(agg.es) <- c("Species", "Mean", "SE")
agg.es <- agg.es %>%
    mutate(Lower_bound = Mean - sqrt(SE) * qnorm(0.975), Upper_bound = Mean + sqrt(SE) *
        qnorm(0.975)) %>%
    arrange(Species)
N_obs <- Physiology %>%
    group_by(phylo) %>%
    summarise(N_obs = n())  # calculate sample size
names(N_obs)[1] <- "Species"
fe.spp.es <- left_join(agg.es, N_obs, by = "Species")
tip.label <- data.frame(Species = tree$tip.label)  # extract tip label
fe.spp.es2 <- left_join(tip.label, fe.spp.es, by = "Species")
fe.spp.es2 <- fe.spp.es2 %>%
    mutate(z = Mean/SE, p = pnorm(abs(z), lower.tail = F) * 2)

# get class data
spp.class <- dplyr::distinct(Physiology, phylo, .keep_all = TRUE) %>%
    dplyr::select(Class, phylo)
names(spp.class)[2] <- "Species"
spp.class2 <- left_join(tip.label, spp.class, by = "Species")

# make each panel
tree.p1 <- ggtree(tree, layout = "rectangular", cex = 0.4)
tree.p2 <- tree.p1 %<+% spp.class2 + geom_tiplab(aes(color = Class), size = 3, fontface = "italic",
    align = T, offset = 0.3) + geom_tippoint(aes(color = Class)) + xlim_expand(c(0,
    21), panel = "Tree") + guides(color = "none") + scale_color_viridis_d()
tree.p3 <- tree.p2 + geom_facet(panel = "Effect size", data = fe.spp.es2, geom = ggstance::geom_pointrangeh,
    mapping = aes(x = Mean, xmin = Lower_bound, xmax = Upper_bound, color = Class)) +
    theme_tree2() + theme(strip.background = element_rect(fill = "white"))

# png(filename = 'Box1_phylo.png', width = 6, height = 5, units = 'in', type =
# 'windows', res = 400)
facet_widths(tree.p3, c(Tree = 0.4, `Effect size` = 0.2))

Figure III presents a typical phylogenetic tree revealing the broad coverage of taxa used in artificial light at night experiments, including birds, mammals, insect, reptiles, and arachnids. For a more in-depth statistical analysis, constructing a phylogenetic correlation matrix can quantify the effect of the shared evolutionary history among species in the meta-analytic evidence base.

Enrichment 2: Bibliometric analysis

The second enrichment analysis is bibliometric anlaysis. Studies are often not independent in terms of their conduct, with highly represented authors potentially dominating the production of evidence, limiting methodological diversity and generalizability. Moreover, the publication of false, exaggerated, and falsified effects is believed to be more common in countries with a “publish or perish” culture. For example, meta-scientific evidence suggests that studies from the United States tend to overestimate effect sizes and exhibit a larger publication bias. Therefore, applying bibliometric analysis in a meta-analytic evidence base can reveal the non-independent evidence, extending the scope of RoB assessment from within-study to between-study levels.

Co-authorship network

Bibliometrics can bring two major additions to meta-analyses. Bibliometrics can construct a co-authorship network for the studies included in a meta-analysis. Mapping the co-authorship network plot can intuitively reveal authorship dependence. Network metrics, such as the degree of centrality, can provide quantitative insights. The figure was inspired by Moulin, et al. (5), who originally implemented it using Matlab and VOSviewer.

Data 3

We use the bibliometric data associated to Data 3 to illustrate the construction of co-authorship network in meta-analytic evidence base.

# load data
dat <- read_xlsx(here("Data", "Sanders_2021.xlsx"))
# load bibliographic data
M <- convert2df(here("Data", "Sanders_2021_bib.csv"), dbsource = "scopus", format = "csv") %>%
    suppressMessages()  # note that using here function does work for data wrangling
## 
## Converting your scopus collection into a bibliographic dataframe
## 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
# merge data
dat.M <- left_join(M, dat, by = c(DI = "DOI"))  # DI denote DOI in bib data

Visualization

Figure IV caption:

An example of co-authorship network visualizing the diversity of the research groups and the degree of centralization of the scientific community. The vertices (nodes) and edges (links) denote authors and co-authorships, respectively. The bubbles represent the author clusters. Each colour denotes a co-authorship cluster (or research group).

# construct author collaboration network
NetMatrix <- biblioNetwork(M, analysis = "collaboration",  network = "authors", sep = ";")
net_matrix <- as.matrix(NetMatrix)
diag(net_matrix) <- 0 
g <- graph_from_adjacency_matrix(net_matrix, mode = "lower", weighted = "weight")
# computing centrality measures for each vertex 
V(g)$indegree <-  igraph::degree(g, mode = "in")
V(g)$outdegree <- igraph::degree(g, mode = "out")
V(g)$closeness <- igraph::closeness(g, mode = "total")
V(g)$betweeness <- igraph::betweenness(g, normalized = TRUE)

#  graph
set.seed(2023)
# using random walks to detect the network community
wtc <- cluster_walktrap(g)
# modularity(wtc)
# modularity(g, membership(wtc))
# member <- communities(wtc)
# n_cluster <- sapply(member, length)

#png(filename = "./Box2_author.png", width = 15, height = 15, units = "in", type = "windows", res = 400)
plot(wtc, g, 
     vertex.size = V(g)$indegree/3, # or V(g)$outdegree + 1
     vertex.label = NA,
     edge.arrow.size = .25)

Figure IV visually depicts the co-author network, with vertices (circles) representing authors and edges (lines) representing co-authorships. Highly-represented authors and dominant research groups can produce the majority of the evidence, reducing its methodological diversity and generalisability (certain groups might report certain results that are caused by bias in experimental designs and analyses). In the example meta-analytic data set, we observe nearly author clusters, each of which, on average, included six authors. The largest cluster had 58 authors. To quantitatively detect the effect of hyper-dominant research groups on meta-analytic estimates, a leave-one-out analysis can be employed to compare the meta-analytic effect size estimates of each research group with that of the rest.

We also can do some quantitative analysis:

# vertex and edges attributes can be accessed via the V and E functions,
# respectively we can list what vertex/edge attributes are available:
# list.vertex.attributes(g) list.edge.attributes(g) V(g)$name[1:6]

# extracting each vectex features as a data.frame
stats <- as_data_frame(g, what = "vertices")

# computing quantiles for each vertex attributes
stats_degree <- with(stats, {
    cbind(indegree = quantile(indegree, c(0.025, 0.5, 0.975), na.rm = TRUE), outdegree = quantile(outdegree,
        c(0.025, 0.5, 0.975), na.rm = TRUE), closeness = quantile(closeness, c(0.025,
        0.5, 0.975), na.rm = TRUE), betweeness = quantile(betweeness, c(0.025, 0.5,
        0.975), na.rm = TRUE))
})

# compute some statistics at the graph level:
cbind(size = vcount(g), nedges = ecount(g), density = edge_density(g), recip = reciprocity(g),
    centr = centr_betw(g)$centralization, pathLen = mean_distance(g))
##      size nedges    density recip      centr  pathLen
## [1,]  576   1739 0.01050121     1 0.01110043 2.834336

Country network

Bibliometrics can also identify dominant and unrepresented countries of author affiliation, revealing interdependences between countries. Following similar principles, bibliometrics can identify research location bias (e.g., studies from high-impact or predatory journals), funding source bias, linguistic bias, and time-lag bias. Notably, a quantitative assessment of those factors can be accomplished by conducting subgroup analysis or meta-regression.

Data 3

The same bibliometric data can be to illustrate the construction of country network in meta-analytic evidence base.

# country
M_country <- metaTagExtraction(M, Field = "AU_CO", sep = ";")
NetMatrix <- biblioNetwork(M_country, analysis = "coupling", network = "countries",
    sep = ";")

net_matrix <- as.matrix(NetMatrix)
# get rid of collaboration with same country
diag(net_matrix) <- 0
# getting rid of lower triangle
net_matrix[lower.tri(net_matrix)] <- 0

# colnames(net_matrix) - change to title case:
colnames(net_matrix) <- str_to_title(colnames(net_matrix))
# rownames(net_matrix) - change to title case:
rownames(net_matrix) <- str_to_title(rownames(net_matrix))
# Fix 'Usa' to 'United States' :
colnames(net_matrix)[colnames(net_matrix) == "Usa"] <- "United States"
rownames(net_matrix)[rownames(net_matrix) == "Usa"] <- "United States"

Visualization

Figure V caption:

An example of a chord diagram showing the epistemological interdependences between different countries of author affiliation in the meta-analytic evidence base. These interdependences are quantified using a bibliographic coupling approach. Two countries are coupled when the cumulative bibliographies of their respective papers share one or more cited references. The coupling strength, an indicator of the dominance, increases as the number of co-cited references between them increases.

# color palette
color <- viridis::viridis(34, alpha = 1, option = "D")
color <- color[sample(1:34)] 

circos.clear()
circos.par(start.degree = 90, 
           gap.degree = 3,  
           points.overflow.warning = FALSE)

#png(filename = "Box2_country.png", width = 5, height = 5, units = "in", type = "windows", res = 400)
chordDiagram(net_matrix, 
             grid.col = color, transparency = 0.1, 
             directional = 1,direction.type = c("arrows", "diffHeight"), diffHeight  = -0.04, annotationTrack = "grid", annotationTrackHeight = c(0.05, 0.1),link.arr.type = "big.arrow",link.sort = TRUE, link.largest.ontop = TRUE,
             preAllocateTracks = 1)
# add text and axis
circos.trackPlotRegion(track.index = 1, panel.fun = function(x, y) {
  xlim = get.cell.meta.data("xlim")
  ylim = get.cell.meta.data("ylim")
  sector.index = get.cell.meta.data("sector.index")
  # add names to the sector. 
  circos.text(x = mean(xlim), y = ylim[1] + .01, 
              labels = sector.index, facing = "clockwise", # clockwise
              niceFacing = TRUE, cex = 0.6, adj = c(0, 0.5))
  # add graduation on axis
  circos.axis(h = "top", labels.cex = 0.01, major.tick.length = 0.01, sector.index = sector.index, track.index = 2, labels.niceFacing = T)
}, bg.border = NA)

Figure V shows the country of affiliation citation network with countries of author affiliation. While there is no obvious indication of dominant countries (with country contributions shown as proportion of the circle’s perimeter), the United States, the United Kingdom, and Germany are the most prominent players in this field and all countries appear to be linked well via article citations. At the statistical follow-up work, a multilevel meta-analytic model with random effects at the levels of author and country of affiliation clusters can be resorted to correct for potential biases in country influence.

Enrichment 3: Altmetric analyses

Analysis of alternative impact metrics, termed as “altmetrics”, is an approach used to estimate research impact beyond academia. It can quantify societal impact by tracking activities on social media (e.g., Facebook and X), Internet pages (e.g., Wikipedia), policy-related documents, and patent applications. Incorporating altmetrics into a meta-analysis can reveal the uptake of research findings by stakeholders outside academia.

We use one meta-analytic data et examining the replicability of the preclinical cancer biology 76 to illustrate the application of almetrics analysis, based on data from Altmetric.

Data 4

The data used for this demonstration is derived from the work of Errington and colleagues (6), who conducted a meta-analysis on the differences between the original effect size estimates and replicated effect size estimates across 50 cancer biology experiments. These experiments were sourced from 23 papers published in high-profile journals, such as Nature, Science, Cell, and PNAS.

# load meta-analytic data
dat <- suppressMessages(read_csv(here("Data", "Timothy_2021.csv")))
# use a custom function to extract data using API because it takes a while to
# extract data using API, we just upload the pre-extracted data.
altmetrics <- read.csv(here("Data", "altmetrics.csv"))
#-------------------not run-------------------#
# altmetric.crawler <- list(NULL) for (n in 1:length(dat$DOI)) { # JASON format
# altmetric.crawler[[n]] <- try(list(format.Altmetric(getAltmetrics(doi =
# dat$DOI[n]))),silent=TRUE) }

# get lists within lists altmetric.crawler2 <- sapply(altmetric.crawler,
# function(x) {x}) retrieve stats altmetrics <-
# altmetric_summary(altmetric.crawler2) save(data, file = 'data.Rdata')
# write.csv(altmetrics,file = 'Data/altmetrics.csv')
#-------------------not run-------------------#
# show data in a datatable
t4 <- altmetrics %>%
    DT::datatable()
t4

Visualization

Figure VI caption:

Examples of visualization showing the results of impact metric analysis. (A) An orchard plot with Altmetric score as the bubble and impact metric related to indicators of practical application (i.e., patent and policy citation counts) as the bubble size. (B) A grid-like graph where: 1) the color and size of the bubbles correspond to the Altmetric score, and 2) the impact metric counts, related to indicator of practical translation (i.e., patent and policy citation counts), respectively. The grey bubble indicates that the Altmetric score exceeds 400. The categories of “full,” “partial,” and “no replication” denote that replication of studies was as fully replicated, partially replicated, or not replicated, respectively.

altmetrics2 <- altmetrics %>%
    distinct(paper, .keep_all = TRUE)
altmetrics2 <- altmetrics2 %>%
    mutate(count = Policy + Patent)
altmetrics2 <- altmetrics2 %>%
    mutate(group = rep("", nrow(altmetrics2)))

# png(filename = 'Box3_altmetric1.png', width = 5, height = 3, units = 'in',
# type = 'windows', res = 400)
Box3_altmetric1 <- ggplot2::ggplot() + ggbeeswarm::geom_quasirandom(data = altmetrics2,
    ggplot2::aes(y = Altmetric.score, x = group, size = count), fill = "#1B9E77",
    col = "#999999", alpha = 0.8, shape = 21) + ylim(0, 300) + ggplot2::coord_flip() +
    ggplot2::theme_bw() + ggplot2::guides(fill = "none", colour = "none") + ggplot2::theme(legend.position = c(0,
    1), legend.justification = c(0, 1)) + ggplot2::theme(legend.title = ggplot2::element_text(size = 9)) +
    ggplot2::theme(legend.direction = "horizontal") + ggplot2::theme(legend.background = ggplot2::element_blank()) +
    ggplot2::labs(x = "Social media interest", y = "Altmetric score", size = latex2exp::TeX("Policy and patent citations")) +
    # ggplot2::theme(axis.ticks.y = element_blank()) +
ggplot2::theme(axis.text.y = ggplot2::element_text(size = 10, colour = "black", hjust = 0.5,
    angle = 0))

# add altmetrics and transnational info to an evidence map load meta-analytic
# data
dat <- suppressMessages(read_csv(here("Data", "Timothy_2021.csv")))
altmetrics <- read.csv(here("Data", "altmetrics.csv"))

# add altemtrics to the original data
dat <- dat %>%
    mutate(Altmetric.score = altmetrics$Altmetric.score, policy = altmetrics$Policy,
        patent = altmetrics$Patent)

dat <- dat[, c("Original paper journal", "Replication study fully completed", "Altmetric.score",
    "Paper #")]
names(dat) <- c("dimension1", "dimension2", "score", "study_id")
dat$score <- round(dat$score, 0)

# delete the cases where the replications have not be conducted due to the lack
# of experimental protocol details
dat <- dat[!dat$dimension2 == "Not applicable", ]

# run altmetric_aggregate function through each combination
est_dat <- dat %>%
    group_by(dimension1, dimension2) %>%
    group_modify(~altmetric_aggregate(.x)) %>%
    ungroup()

est_dat1 <- est_dat

# policy and patent counts
dat <- suppressMessages(read_csv(here("Data", "Timothy_2021.csv")))

# add altemtrics to the original data
dat <- dat %>%
    mutate(Altmetric.score = altmetrics$Altmetric.score, policy = altmetrics$Policy,
        patent = altmetrics$Patent)
dat <- dat %>%
    mutate(count = patent + policy)

dat <- dat[, c("Original paper journal", "Replication study fully completed", "count",
    "Paper #")]
names(dat) <- c("dimension1", "dimension2", "count", "study_id")
dat$count <- round(dat$count, 0)

# delete the cases where the replications have not be conducted due to the lack
# of experimental protocol details
dat <- dat[!dat$dimension2 == "Not applicable", ]

# run translation_aggregate function through each cell
est_dat <- dat %>%
    group_by(dimension1, dimension2) %>%
    group_modify(~translation_aggregate(.x)) %>%
    ungroup()

est_dat2 <- est_dat

colnames(est_dat2)[colnames(est_dat2) == "estimate"] <- "count"
est_dat2 <- est_dat2[, 1:3]

# combine
est_dat <- merge(est_dat1, est_dat2, by = c("dimension1", "dimension2"))
est_dat$count <- round(est_dat$count, 1)

est_dat <- est_dat %>%
    mutate(dimension2 = case_when(dimension2 == "No replication" ~ "Incomplete replication",
        dimension2 == "Full replication" ~ "Full replication", dimension2 == "Partial replication" ~
            "Partial replication"))


# png(filename = 'Box3_altmetric2.png', width = 6, height = 6, units = 'in',
# type = 'windows', res = 400)
Box3_altmetric2 <- ggplot(est_dat, aes(x = dimension1, y = dimension2, size = count,
    color = estimate)) + geom_point(alpha = 0.6) + scale_color_gradient(low = "#E6AB02",
    high = "purple", limits = c(0, 400), guide = "colourbar") + labs(x = "Dimension 1 (Journal)",
    y = " Dimension 2 (Replication completion)", color = "Altmetric score") + scale_size_identity() +
    scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) + scale_y_discrete(labels = function(x) str_wrap(x,
    width = 10)) + theme_bw() + guides(size = "none") + theme(legend.position = "top",
    legend.justification = "right", legend.direction = "horizontal", axis.text = element_text(color = "black"),
    axis.title = element_text(color = "black")) + geom_text(aes(label = as.character(count)),
    size = 2.5, color = "gray10", fontface = "bold") + labs(caption = "The value in the cell is the citation count of policies and patents") +
    theme(plot.caption = element_text(size = 8, color = "gray10", face = "italic"))


# png(filename = 'Box3_altmetric.png', width = 5, height = 7, units = 'in',
# type = 'windows', res = 400)
plot_grid(Box3_altmetric1, Box3_altmetric2, labels = c("A", "B"), label_size = 14,
    nrow = 2, ncol = 1, rel_heights = c(1, 2))

Figure VI, panel A, indicates the substantial social media attention earned by cancer biology studies included in the meta-analytic evidence base. A large portion of these studies were mentioned in policy documents and patent applications, indicating a potential degree of practical translation. Notably, full replicated studies exhibited relatively higher Altmetric scores, and larger policy and patent citation counts compared to that of studies that were only partially replicated and not replicated. Among the fully replicated studies, those published in PNAS, Cancer Biology, and Cell, exhibited higher impact metrics than those published in Nature and Science.

License

This documented is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International.

Software and package versions

sessionInfo() %>%
    pander()

R version 4.0.3 (2020-10-10)

Platform: x86_64-w64-mingw32/x64 (64-bit)

locale: _LC_COLLATE=Chinese (Simplified)China.936, _LC_CTYPE=Chinese (Simplified)China.936, _LC_MONETARY=Chinese (Simplified)China.936, LC_NUMERIC=C and _LC_TIME=Chinese (Simplified)China.936

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: igraph(v.1.3.0), circlize(v.0.4.15), bibliometrix(v.4.0.2), flextable(v.0.8.6), ggtree(v.3.7.1.001), ggstance(v.0.3.5), ape(v.5.6-1), cowplot(v.1.1.1), rotl(v.3.0.11), formatR(v.1.11), pander(v.0.6.4), ggthemr(v.1.1.0), viridis(v.0.6.2), viridisLite(v.0.4.0), car(v.3.0-11), carData(v.3.0-4), lme4(v.1.1-26), metafor(v.4.2-0), numDeriv(v.2016.8-1.1), metadat(v.1.2-0), Matrix(v.1.5-3), readxl(v.1.3.1), ggpubr(v.0.4.0), DT(v.0.19), here(v.1.0.1), forcats(v.0.5.2), stringr(v.1.5.0), dplyr(v.1.0.10), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.1), tibble(v.3.1.8), ggplot2(v.3.4.4), tidyverse(v.1.3.1), rmdformats(v.1.0.3) and knitr(v.1.37)

loaded via a namespace (and not attached): pacman(v.0.5.1), utf8(v.1.2.2), tidyselect(v.1.2.0), htmlwidgets(v.1.5.3), FactoMineR(v.2.6), grid(v.4.0.3), clubSandwich(v.0.5.3), munsell(v.0.5.0), codetools(v.0.2-18), ragg(v.1.2.5), statmod(v.1.4.35), rentrez(v.1.2.3), withr(v.2.5.0), colorspace(v.2.0-0), highr(v.0.9), uuid(v.0.1-4), rstudioapi(v.0.13), leaps(v.3.1), ggsignif(v.0.6.3), officer(v.0.6.0), fontLiberation(v.0.1.0), labeling(v.0.4.2), emmeans(v.1.6.3), bit64(v.4.0.5), farver(v.2.1.0), rprojroot(v.2.0.2), coda(v.0.19-4), vctrs(v.0.5.0), treeio(v.1.14.4), generics(v.0.1.0), TH.data(v.1.1-0), xfun(v.0.29), fontquiver(v.0.2.1), R6(v.2.5.1), ggbeeswarm(v.0.6.0), cachem(v.1.0.6), gridGraphics(v.0.5-1), assertthat(v.0.2.1), vroom(v.1.5.7), promises(v.1.2.0.1), scales(v.1.2.1), multcomp(v.1.4-17), beeswarm(v.0.4.0), gtable(v.0.3.0), multcompView(v.0.1-8), sandwich(v.3.0-1), rlang(v.1.1.1), systemfonts(v.1.0.4), scatterplot3d(v.0.3-41), dimensionsR(v.0.0.3), GlobalOptions(v.0.1.2), splines(v.4.0.3), rstatix(v.0.7.0), lazyeval(v.0.2.2), broom(v.1.0.1), yaml(v.2.2.1), abind(v.1.4-5), modelr(v.0.1.8), tidytext(v.0.3.4), crosstalk(v.1.1.1), backports(v.1.2.1), httpuv(v.1.6.2), tokenizers(v.0.2.1), tools(v.4.0.3), bookdown(v.0.24), pubmedR(v.0.0.3), ggplotify(v.0.0.9), ellipsis(v.0.3.2), jquerylib(v.0.1.4), RColorBrewer(v.1.1-3), latex2exp(v.0.9.4), plyr(v.1.8.6), Rcpp(v.1.0.8.3), rscopus(v.0.6.6), progress(v.1.2.2), prettyunits(v.1.1.1), openssl(v.1.4.4), zoo(v.1.8-9), haven(v.2.4.3), ggrepel(v.0.9.1), cluster(v.2.1.0), factoextra(v.1.0.7), fs(v.1.5.2), crul(v.1.3), magrittr(v.2.0.3), data.table(v.1.14.0), openxlsx(v.4.2.4), reprex(v.2.0.1), mvtnorm(v.1.1-3), SnowballC(v.0.7.0), bibliometrixData(v.0.3.0), hms(v.1.1.0), patchwork(v.1.1.1), mime(v.0.11), evaluate(v.0.14), xtable(v.1.8-4), XML(v.3.99-0.7), rio(v.0.5.29), shape(v.1.4.6), gridExtra(v.2.3), compiler(v.4.0.3), fontBitstreamVera(v.0.1.1), crayon(v.1.4.1), minqa(v.1.2.4), htmltools(v.0.5.2), ggfun(v.0.0.8), later(v.1.3.0), tzdb(v.0.1.2), aplot(v.0.1.8), lubridate(v.1.7.10), DBI(v.1.1.1), dbplyr(v.2.1.1), MASS(v.7.3-54), boot(v.1.3-28), cli(v.3.4.1), parallel(v.4.0.3), pkgconfig(v.2.0.3), flashClust(v.1.01-2), rncl(v.0.8.4), foreign(v.0.8-81), plotly(v.4.9.4.1), xml2(v.1.3.2), vipor(v.0.4.5), bslib(v.0.3.0), stringdist(v.0.9.7), estimability(v.1.3), rvest(v.1.0.1), yulab.utils(v.0.0.5), janeaustenr(v.1.0.0), digest(v.0.6.27), httpcode(v.0.3.0), rmarkdown(v.2.11), cellranger(v.1.1.0), tidytree(v.0.4.1), gdtools(v.0.3.1), curl(v.4.3.2), shiny(v.1.6.0), nloptr(v.1.2.2.2), lifecycle(v.1.0.3), nlme(v.3.1-151), jsonlite(v.1.7.2), askpass(v.1.1), fansi(v.0.5.0), pillar(v.1.8.1), ggsci(v.2.9), lattice(v.0.20-41), fastmap(v.1.1.0), httr(v.1.4.2), survival(v.3.2-7), glue(v.1.6.2), zip(v.2.2.0), bit(v.4.0.4), stringi(v.1.7.4), sass(v.0.4.0), textshaping(v.0.3.6), gfonts(v.0.2.0), memoise(v.2.0.0) and mathjaxr(v.1.2-0)

References

1.
J. R. Polanin, Q. Zhang, J. A. Taylor, R. T. Williams, M. Joshi, L. Burr, Evidence gap maps in education research. Journal of Research on Educational Effectiveness 16, 532–552 (2023).
2.
A. Hodkinson, P. Bower, C. Grigoroglou, S. S. Zghebi, H. Pinnock, E. Kontopantelis, M. Panagioti, Self-management interventions to reduce healthcare use and improve quality of life among patients with asthma: Systematic review and network meta-analysis. BMj 370 (2020).
3.
S. Mertens, M. Herberz, U. J. Hahnel, T. Brosch, The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. Proceedings of the National Academy of Sciences 119, e2107346118 (2022).
4.
D. Sanders, E. Frago, R. Kehoe, C. Patterson, K. J. Gaston, A meta-analysis of biological impacts of artificial light at night. Nature Ecology & Evolution 5, 74–81 (2021).
5.
T. C. Moulin, O. B. Amaral, Using collaboration networks to identify authorship dependence in meta-analysis results. Research Synthesis Methods 11, 655–668 (2020).
6.
T. M. Errington, M. Mathur, C. K. Soderberg, A. Denis, N. Perfito, E. Iorns, B. A. Nosek, Investigating the replicability of preclinical cancer biology. Elife 10, e71601 (2021).